A global optimal algorithm for class-dependent discretization of continuous data

نویسندگان

  • Lili Liu
  • Andrew K. C. Wong
  • Yang Wang
چکیده

This paper presents a new method to convert continuous variables into discrete variables for inductive machine learning. The method can be applied to pattern classification problems in machine learning and data mining. The discretization process is formulated as an optimization problem. We first use the normalized mutual information that measures the interdependence between the class labels and the variable to be discretized as the objective function, and then use fractional programming (iterative dynamic programming) to find its optimum. Unlike the majority of class-dependent discretization methods in the literature which only find the local optimum of the objective functions, the proposed method, OCDD, or Optimal Class-Dependent Discretization, finds the global optimum. The experimental results demonstrate that this algorithm is very effective in classification when coupled with popular learning systems such as C4.5 decision trees and Naive-Bayes classifier. It can be used to discretize continuous variables for many existing inductive learning systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A dynamic-programming algorithm for hierarchical discretization of continuous attributes

Discretization techniques can be used to reduce the number of values for a given continuous attribute, and a concept hierarchy can be used to define a discretization of a given continuous attribute. Traditional methods of building a concept hierarchy from a continuous attribute are usually based on the level-wise approach. Unfortunately, this approach suffers from three weaknesses: (1) it only ...

متن کامل

Optimal Multiple Intervals Discretization of Continuous Attributes for Supervised Learning

5, av Pierre Mend&s-France 69676 BRON CEDEX FRANCE {zighed,rakotoma,ffeschet)@univ-lyon2.fr In this paper, we propose an extension of Fischer’s algorithm to compute the optimal discretization of a continuous variable in the context of supervised learning. Our algorithm is extremely performant since its only depends on the number of runs and not directly on the number of points of the sample dat...

متن کامل

A Continuous Plane Model to Machine Layout Problems Considering Pick-Up and Drop-Off Points: An Evolutionary Algorithm

One of the well-known evolutionary algorithms inspired by biological evolution is genetic algorithm (GA) that is employed as a robust and global optimization tool to search for the best or near-optimal solution with the search space. In this paper, this algorithm is used to solve unequalsized machines (or intra-cell) layout problems considering pick-up and drop-off (input/output) points. Such p...

متن کامل

Multiclass Spectral Clustering

We propose a principled account on multiclass spectral clustering. Given a discrete clustering formulation, we first solve a relaxed continuous optimization problem by eigendecomposition. We clarify the role of eigenvectors as a generator of all optimal solutions through orthonormal transforms. We then solve an optimal discretization problem, which seeks a discrete solution closest to the conti...

متن کامل

An algorithm for the global optimization of a class of continuous minimax problems

We propose an algorithm for the global optimization of continuous minimax problems involving polynomials. The method can be described as a discretization approach to the well known semi-infinite formulation of the problem. We 1 Financial support of EPSRC Grant GR/T02560/01 gratefully acknowledged. 2 Research Associate, Department of Computing, Imperial College, London, UK. 3 Professor, Departme...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2004